Add a memory bound FileStatisticsCache for the Listing Table by mkleen · Pull Request #20047 · apache/datafusion

mkleen · 2026-01-28T13:50:43Z

Which issue does this PR close?

This change introduces a default FileStatisticsCache implementation for the Listing-Table with a size limit, implementing the following steps following #19052 (comment) :

Add heap size estimation for file statistics and the relevant data types used in caching (This is temporary until Add a crate for HeapSize trait arrow-rs#9138 is resolved)
Redesign DefaultFileStatisticsCache to use a LruQueue to make it memory-bound following Adds memory-bound DefaultListFilesCache #18855
Introduce a size limit and use it together with the heap-size to limit the memory usage of the cache
Move FileStatisticsCache creation into CacheManager, making it session-scoped and shared across statements and tables
Closes Add a default FileStatisticsCache implementation for the ListingTable #19217
Closes Add limit to DefaultFileStatisticsCache #19052

Rationale for this change

See above.

What changes are included in this PR?

See above.

Are these changes tested?

Yes.

Are there any user-facing changes?

A new runtime setting datafusion.runtime.file_statistics.cache_limit

kosiew

@mkleen

Thanks for working on this.

mkleen · 2026-02-04T12:10:45Z

@kosiew Thank you for the feedback!

kosiew

LGTM

mkleen · 2026-02-10T05:18:22Z

@kosiew Anything else needed to get this merged? Another approval maybe?

martin-g · 2026-02-10T07:04:54Z

+impl<T: DFHeapSize> DFHeapSize for Arc<T> {
+    fn heap_size(&self) -> usize {
+        // Arc stores weak and strong counts on the heap alongside an instance of T
+        2 * size_of::<usize>() + size_of::<T>() + self.as_ref().heap_size()


This won't be accurate.

let a1 = Arc::new(vec![1, 2, 3]); let a2 = a1.clone(); let a3 = a1.clone(); let a4 = a3.clone(); // this should be true because all `a`s point to the same object in memory // but the current implementation does not detect this and counts them separately assert_eq!(a4.heap_size(), a1.heap_size() + a2.heap_size() + a3.heap_size() + a4.heap_size());

The only solution I imagine is the caller to keep track of the pointer addresses which have been "sized" and ignore any Arc's which point to an address which has been "sized" earlier.

Good catch! I took this implementation from https://github.com/apache/arrow-rs/blob/main/parquet/src/file/metadata/memory.rs#L97-L102 . I would suggest to also do a follow-up here. We are planing anyway to restructure the whole heap size estimation.

mkleen · 2026-02-10T08:02:48Z

@martin-g Thanks for this great review! I am on it.

mkleen · 2026-05-15T06:22:03Z

Hi @kosiew This is ready for another review. I fixed all remaining open issues.

kosiew · 2026-05-15T07:29:30Z

run benchmark clickbench_partitioned

adriangbot · 2026-05-15T07:32:42Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4457908563-122-6mxc6 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing file-stats-cache (f648bee) to 9d92944 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

mkleen · 2026-05-15T07:45:05Z

@kosiew Thank you!

adriangbot · 2026-05-15T07:53:26Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and file-stats-cache
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃                      file-stats-cache ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.21 / 4.58 ±6.67 / 17.93 ms │          1.27 / 4.69 ±6.79 / 18.26 ms │    no change │
│ QQuery 1  │        12.46 / 12.62 ±0.12 / 12.78 ms │        12.80 / 12.90 ±0.10 / 13.04 ms │    no change │
│ QQuery 2  │        35.22 / 35.57 ±0.29 / 36.08 ms │        35.57 / 35.83 ±0.18 / 36.12 ms │    no change │
│ QQuery 3  │        30.78 / 32.84 ±1.91 / 35.66 ms │        30.74 / 31.63 ±0.85 / 32.91 ms │    no change │
│ QQuery 4  │     230.28 / 234.55 ±3.18 / 238.52 ms │     226.34 / 231.59 ±3.27 / 234.96 ms │    no change │
│ QQuery 5  │     275.85 / 277.08 ±1.61 / 280.21 ms │     271.34 / 275.08 ±2.60 / 278.63 ms │    no change │
│ QQuery 6  │           6.78 / 7.05 ±0.27 / 7.55 ms │           6.94 / 7.18 ±0.20 / 7.50 ms │    no change │
│ QQuery 7  │        13.43 / 13.52 ±0.06 / 13.59 ms │        13.58 / 13.81 ±0.16 / 14.06 ms │    no change │
│ QQuery 8  │     324.80 / 328.39 ±2.52 / 331.95 ms │     323.04 / 327.41 ±2.72 / 331.44 ms │    no change │
│ QQuery 9  │     448.01 / 452.30 ±3.96 / 457.57 ms │     443.43 / 449.73 ±6.36 / 460.10 ms │    no change │
│ QQuery 10 │        70.36 / 71.07 ±0.63 / 72.11 ms │        69.35 / 70.94 ±2.41 / 75.74 ms │    no change │
│ QQuery 11 │        81.10 / 81.51 ±0.38 / 81.97 ms │        79.23 / 82.35 ±3.64 / 89.48 ms │    no change │
│ QQuery 12 │     268.41 / 270.50 ±2.01 / 273.33 ms │     268.41 / 271.12 ±3.02 / 276.25 ms │    no change │
│ QQuery 13 │     381.12 / 389.06 ±5.76 / 396.66 ms │     384.90 / 392.02 ±9.20 / 409.94 ms │    no change │
│ QQuery 14 │     280.97 / 283.81 ±3.36 / 290.36 ms │     281.68 / 284.83 ±3.82 / 292.30 ms │    no change │
│ QQuery 15 │     275.82 / 277.72 ±2.34 / 282.29 ms │     272.50 / 276.75 ±2.79 / 281.01 ms │    no change │
│ QQuery 16 │     618.46 / 622.50 ±3.40 / 627.69 ms │     618.44 / 625.25 ±5.24 / 631.36 ms │    no change │
│ QQuery 17 │     620.57 / 627.06 ±4.15 / 631.48 ms │     617.12 / 626.95 ±7.20 / 635.46 ms │    no change │
│ QQuery 18 │  1255.20 / 1266.72 ±9.85 / 1281.00 ms │ 1252.12 / 1264.61 ±15.23 / 1293.72 ms │    no change │
│ QQuery 19 │        28.02 / 31.93 ±7.63 / 47.18 ms │        28.23 / 33.45 ±8.45 / 50.09 ms │    no change │
│ QQuery 20 │     517.10 / 530.69 ±9.77 / 541.76 ms │     516.81 / 523.58 ±4.90 / 530.36 ms │    no change │
│ QQuery 21 │     593.08 / 603.25 ±8.57 / 616.44 ms │     598.38 / 601.84 ±3.37 / 607.52 ms │    no change │
│ QQuery 22 │  1062.65 / 1073.15 ±5.52 / 1078.45 ms │ 1054.75 / 1067.71 ±14.94 / 1094.86 ms │    no change │
│ QQuery 23 │ 3178.27 / 3197.45 ±16.00 / 3215.72 ms │ 3132.72 / 3148.74 ±14.69 / 3169.53 ms │    no change │
│ QQuery 24 │        41.47 / 41.99 ±0.47 / 42.69 ms │        41.58 / 44.38 ±2.89 / 49.51 ms │ 1.06x slower │
│ QQuery 25 │     111.25 / 112.74 ±1.37 / 115.11 ms │     111.84 / 113.31 ±0.98 / 114.84 ms │    no change │
│ QQuery 26 │        41.85 / 43.27 ±0.99 / 44.57 ms │        41.77 / 42.90 ±1.10 / 44.98 ms │    no change │
│ QQuery 27 │     666.48 / 675.80 ±5.31 / 681.45 ms │     669.86 / 675.88 ±6.89 / 689.27 ms │    no change │
│ QQuery 28 │  3019.88 / 3033.00 ±7.37 / 3042.27 ms │  2979.01 / 2993.33 ±7.87 / 3001.62 ms │    no change │
│ QQuery 29 │        41.66 / 47.52 ±7.10 / 59.68 ms │        41.24 / 46.01 ±8.17 / 62.28 ms │    no change │
│ QQuery 30 │     300.22 / 306.19 ±5.61 / 315.38 ms │     299.46 / 305.60 ±3.86 / 311.23 ms │    no change │
│ QQuery 31 │     293.21 / 302.81 ±7.08 / 313.69 ms │     293.71 / 299.27 ±4.53 / 303.83 ms │    no change │
│ QQuery 32 │     989.40 / 994.77 ±3.24 / 998.31 ms │   984.07 / 996.33 ±11.45 / 1014.82 ms │    no change │
│ QQuery 33 │ 1413.59 / 1432.39 ±21.81 / 1471.69 ms │  1413.37 / 1425.90 ±9.52 / 1440.77 ms │    no change │
│ QQuery 34 │  1425.08 / 1437.86 ±6.50 / 1442.51 ms │ 1418.22 / 1476.95 ±63.31 / 1596.92 ms │    no change │
│ QQuery 35 │    277.44 / 311.98 ±42.77 / 392.86 ms │     309.15 / 317.20 ±7.72 / 327.72 ms │    no change │
│ QQuery 36 │        66.14 / 70.52 ±2.72 / 74.25 ms │        70.52 / 75.12 ±4.11 / 81.54 ms │ 1.07x slower │
│ QQuery 37 │        35.02 / 35.94 ±0.57 / 36.58 ms │        38.04 / 40.48 ±2.45 / 44.69 ms │ 1.13x slower │
│ QQuery 38 │        39.71 / 42.03 ±1.39 / 43.54 ms │        42.75 / 46.11 ±2.71 / 50.99 ms │ 1.10x slower │
│ QQuery 39 │    125.16 / 140.06 ±10.52 / 149.46 ms │     143.78 / 149.00 ±7.79 / 164.51 ms │ 1.06x slower │
│ QQuery 40 │        13.80 / 14.09 ±0.27 / 14.59 ms │        15.42 / 16.94 ±2.43 / 21.77 ms │ 1.20x slower │
│ QQuery 41 │        13.52 / 13.55 ±0.03 / 13.60 ms │        14.99 / 15.20 ±0.19 / 15.50 ms │ 1.12x slower │
│ QQuery 42 │        12.96 / 13.16 ±0.14 / 13.28 ms │        14.28 / 16.59 ±4.25 / 25.09 ms │ 1.26x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 19794.61ms │
│ Total Time (file-stats-cache)   │ 19756.48ms │
│ Average Time (HEAD)             │   460.34ms │
│ Average Time (file-stats-cache) │   459.45ms │
│ Queries Faster                  │          0 │
│ Queries Slower                  │          8 │
│ Queries with No Change          │         35 │
│ Queries with Failure            │          0 │
└─────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	29.9 GiB
Avg memory	23.2 GiB
CPU user	1041.8s
CPU sys	63.2s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	30.4 GiB
Avg memory	23.4 GiB
CPU user	1039.4s
CPU sys	64.8s
Peak spill	0 B

File an issue against this benchmark runner

kosiew

@mkleen

Thanks for the iteration

Looks 👍 to me

kosiew · 2026-05-15T08:08:13Z

@mkleen
I will add this to the merge queue after 24 hours if there are no further issues.

mkleen · 2026-05-15T10:45:23Z

@kosiew Thank you for all the work and the reviews!

alamb · 2026-05-15T19:18:09Z

run benchmark clickbench_partitioned

alamb · 2026-05-15T19:18:38Z

I want to see if some of these slowdowns are reproduced

│ QQuery 35 │    277.44 / 311.98 ±42.77 / 392.86 ms │     309.15 / 317.20 ±7.72 / 327.72 ms │    no change │
│ QQuery 36 │        66.14 / 70.52 ±2.72 / 74.25 ms │        70.52 / 75.12 ±4.11 / 81.54 ms │ 1.07x slower │
│ QQuery 37 │        35.02 / 35.94 ±0.57 / 36.58 ms │        38.04 / 40.48 ±2.45 / 44.69 ms │ 1.13x slower │
│ QQuery 38 │        39.71 / 42.03 ±1.39 / 43.54 ms │        42.75 / 46.11 ±2.71 / 50.99 ms │ 1.10x slower │
│ QQuery 39 │    125.16 / 140.06 ±10.52 / 149.46 ms │     143.78 / 149.00 ±7.79 / 164.51 ms │ 1.06x slower │
│ QQuery 40 │        13.80 / 14.09 ±0.27 / 14.59 ms │        15.42 / 16.94 ±2.43 / 21.77 ms │ 1.20x slower │
│ QQuery 41 │        13.52 / 13.55 ±0.03 / 13.60 ms │        14.99 / 15.20 ±0.19 / 15.50 ms │ 1.12x slower │
│ QQuery 42 │        12.96 / 13.16 ±0.14 / 13.28 ms │        14.28 / 16.59 ±4.25 / 25.09 ms │ 1.26x slower │

alamb

Thanks for pushing this one through @mkleen -- very impressive

adriangbot · 2026-05-15T19:19:47Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4462702395-151-55fcl 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing file-stats-cache (f648bee) to 9d92944 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-15T19:35:19Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and file-stats-cache
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃                      file-stats-cache ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.21 / 4.71 ±6.86 / 18.44 ms │          1.30 / 4.84 ±6.93 / 18.69 ms │     no change │
│ QQuery 1  │        12.53 / 12.78 ±0.18 / 12.99 ms │        12.60 / 13.00 ±0.21 / 13.17 ms │     no change │
│ QQuery 2  │        35.58 / 35.85 ±0.28 / 36.38 ms │        35.98 / 36.25 ±0.32 / 36.72 ms │     no change │
│ QQuery 3  │        31.46 / 31.91 ±0.46 / 32.79 ms │        31.57 / 32.23 ±0.41 / 32.73 ms │     no change │
│ QQuery 4  │     241.91 / 244.85 ±1.66 / 246.64 ms │     244.34 / 247.59 ±4.37 / 256.22 ms │     no change │
│ QQuery 5  │     285.58 / 286.91 ±1.81 / 290.48 ms │     288.66 / 290.41 ±1.83 / 293.24 ms │     no change │
│ QQuery 6  │           5.91 / 7.20 ±0.68 / 7.73 ms │          6.60 / 8.43 ±2.02 / 12.36 ms │  1.17x slower │
│ QQuery 7  │        13.95 / 14.13 ±0.16 / 14.37 ms │        14.08 / 14.20 ±0.09 / 14.32 ms │     no change │
│ QQuery 8  │     339.54 / 342.56 ±2.50 / 347.08 ms │     345.51 / 348.46 ±2.50 / 351.80 ms │     no change │
│ QQuery 9  │     465.89 / 473.60 ±5.72 / 480.85 ms │     455.62 / 469.15 ±8.05 / 480.71 ms │     no change │
│ QQuery 10 │        71.54 / 74.50 ±2.23 / 78.03 ms │        70.03 / 73.47 ±4.08 / 81.48 ms │     no change │
│ QQuery 11 │        82.55 / 83.09 ±0.47 / 83.84 ms │        80.75 / 81.63 ±0.67 / 82.35 ms │     no change │
│ QQuery 12 │     282.26 / 286.51 ±3.69 / 292.47 ms │     284.36 / 287.10 ±2.21 / 289.95 ms │     no change │
│ QQuery 13 │     398.79 / 404.83 ±5.96 / 415.54 ms │     404.20 / 409.11 ±5.41 / 419.41 ms │     no change │
│ QQuery 14 │     293.48 / 298.78 ±6.53 / 311.32 ms │     295.85 / 299.17 ±3.20 / 305.00 ms │     no change │
│ QQuery 15 │     290.95 / 295.26 ±4.51 / 303.87 ms │     292.16 / 295.33 ±3.36 / 301.53 ms │     no change │
│ QQuery 16 │     652.37 / 659.91 ±7.59 / 673.19 ms │     651.63 / 661.65 ±5.77 / 666.73 ms │     no change │
│ QQuery 17 │    647.57 / 664.81 ±16.39 / 691.92 ms │     660.97 / 668.32 ±7.50 / 681.91 ms │     no change │
│ QQuery 18 │ 1338.75 / 1357.27 ±11.59 / 1371.69 ms │ 1316.96 / 1341.51 ±15.44 / 1363.43 ms │     no change │
│ QQuery 19 │        28.89 / 31.81 ±4.86 / 41.50 ms │        29.43 / 32.90 ±4.11 / 40.02 ms │     no change │
│ QQuery 20 │     525.48 / 531.49 ±5.86 / 541.90 ms │    526.28 / 534.57 ±10.38 / 554.23 ms │     no change │
│ QQuery 21 │     602.56 / 610.52 ±6.71 / 620.76 ms │     600.09 / 605.48 ±3.88 / 610.42 ms │     no change │
│ QQuery 22 │  1079.17 / 1088.53 ±6.77 / 1099.45 ms │ 1078.82 / 1093.62 ±14.56 / 1116.80 ms │     no change │
│ QQuery 23 │ 3249.01 / 3310.32 ±56.28 / 3414.37 ms │ 3287.30 / 3314.45 ±33.65 / 3373.81 ms │     no change │
│ QQuery 24 │        42.70 / 44.60 ±2.85 / 50.25 ms │        42.97 / 43.66 ±0.86 / 45.33 ms │     no change │
│ QQuery 25 │     114.71 / 115.82 ±0.89 / 116.95 ms │     114.37 / 117.95 ±4.41 / 126.57 ms │     no change │
│ QQuery 26 │        43.01 / 44.04 ±0.71 / 45.14 ms │        43.56 / 44.23 ±0.47 / 44.80 ms │     no change │
│ QQuery 27 │     672.49 / 678.88 ±6.18 / 690.65 ms │     685.77 / 694.13 ±6.70 / 702.68 ms │     no change │
│ QQuery 28 │ 3066.81 / 3088.29 ±16.66 / 3112.12 ms │ 3054.55 / 3067.15 ±19.49 / 3105.98 ms │     no change │
│ QQuery 29 │       42.02 / 48.97 ±13.16 / 75.28 ms │       42.62 / 54.97 ±12.89 / 77.53 ms │  1.12x slower │
│ QQuery 30 │     316.29 / 323.83 ±7.47 / 334.28 ms │     313.40 / 320.10 ±4.16 / 325.23 ms │     no change │
│ QQuery 31 │     313.03 / 317.84 ±3.49 / 320.89 ms │     305.64 / 308.95 ±4.08 / 316.73 ms │     no change │
│ QQuery 32 │ 1066.10 / 1082.45 ±16.15 / 1111.61 ms │ 1047.94 / 1068.44 ±21.73 / 1110.39 ms │     no change │
│ QQuery 33 │ 1523.39 / 1551.55 ±28.19 / 1605.85 ms │ 1472.49 / 1525.41 ±32.08 / 1572.33 ms │     no change │
│ QQuery 34 │ 1536.82 / 1557.59 ±16.31 / 1578.53 ms │ 1501.20 / 1515.59 ±11.32 / 1529.17 ms │     no change │
│ QQuery 35 │    307.23 / 332.28 ±31.81 / 394.95 ms │     303.27 / 309.45 ±7.26 / 322.05 ms │ +1.07x faster │
│ QQuery 36 │        64.46 / 70.82 ±3.74 / 75.54 ms │        64.28 / 67.87 ±4.37 / 76.45 ms │     no change │
│ QQuery 37 │        36.94 / 40.16 ±3.45 / 44.99 ms │        36.09 / 38.80 ±2.30 / 41.74 ms │     no change │
│ QQuery 38 │        41.10 / 44.36 ±3.90 / 51.52 ms │        42.19 / 45.16 ±2.46 / 49.48 ms │     no change │
│ QQuery 39 │     130.98 / 136.49 ±5.82 / 145.58 ms │     133.19 / 144.34 ±6.86 / 153.48 ms │  1.06x slower │
│ QQuery 40 │        14.76 / 15.95 ±1.85 / 19.63 ms │        15.27 / 16.94 ±2.58 / 22.08 ms │  1.06x slower │
│ QQuery 41 │        14.64 / 16.79 ±4.23 / 25.25 ms │        14.68 / 15.98 ±2.45 / 20.88 ms │     no change │
│ QQuery 42 │        13.72 / 13.90 ±0.15 / 14.11 ms │        14.04 / 14.38 ±0.29 / 14.85 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 20676.77ms │
│ Total Time (file-stats-cache)   │ 20576.38ms │
│ Average Time (HEAD)             │   480.86ms │
│ Average Time (file-stats-cache) │   478.52ms │
│ Queries Faster                  │          1 │
│ Queries Slower                  │          4 │
│ Queries with No Change          │         38 │
│ Queries with Failure            │          0 │
└─────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	29.8 GiB
Avg memory	23.0 GiB
CPU user	1084.2s
CPU sys	71.2s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	105.0s
Peak memory	29.9 GiB
Avg memory	23.0 GiB
CPU user	1079.2s
CPU sys	71.6s
Peak spill	0 B

File an issue against this benchmark runner

mkleen · 2026-05-16T12:06:09Z

@alamb It looks to me that the slowdowns are not real, but only noise.

alamb · 2026-05-18T19:24:07Z

@alamb It looks to me that the slowdowns are not real, but only noise.

Agreed

alamb · 2026-05-18T19:24:46Z

@nuno-faria and @kosiew -- is there anything else we need to do before merging this PR?

nuno-faria · 2026-05-18T19:44:48Z

@nuno-faria and @kosiew -- is there anything else we need to do before merging this PR?

Not on my end. Thanks again @mkleen for the hard work!

alamb · 2026-05-18T20:52:21Z

All right, adding to the merge queue to get this in for 54. Thanks again @mkleen

alamb · 2026-05-20T15:44:05Z

🎉 -- amazing - Thank you so much @mkleen and @nuno-faria and @kosiew

## Which issue does this PR close? None. ## Rationale for this change This is a follow-up for apache#20047 improving docs and test coverage for the heap size estimation. ## What changes are included in this PR? See above. ## Are these changes tested? Yes. ## Are there any user-facing changes? No.

github-actions Bot added documentation Improvements or additions to documentation core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate common Related to common crate execution Related to the execution crate labels Jan 28, 2026

mkleen force-pushed the file-stats-cache branch from a66420a to 3b33739 Compare January 28, 2026 13:56

mkleen mentioned this pull request Jan 28, 2026

Add limit to DefaultFileStatisticsCache #19052

Closed

mkleen force-pushed the file-stats-cache branch from 3b33739 to 8e5560b Compare January 28, 2026 14:19

github-actions Bot removed the documentation Improvements or additions to documentation label Jan 28, 2026

mkleen force-pushed the file-stats-cache branch 2 times, most recently from e273afc to b297378 Compare January 28, 2026 14:40

github-actions Bot added the documentation Improvements or additions to documentation label Jan 28, 2026

mkleen marked this pull request as ready for review January 28, 2026 16:23

mkleen changed the title ~~Add a default FileStatisticsCache implementation for the ListingTable~~ Add a default FileStatisticsCache with a size limit Jan 28, 2026

mkleen changed the title ~~Add a default FileStatisticsCache with a size limit~~ Add a FileStatisticsCache with a size limit Jan 28, 2026

mkleen changed the title ~~Add a FileStatisticsCache with a size limit~~ Add FileStatisticsCache with a size limit Jan 28, 2026

mkleen changed the title ~~Add FileStatisticsCache with a size limit~~ Add a memory bound FileStatisticsCache with a size limit Jan 29, 2026

mkleen changed the title ~~Add a memory bound FileStatisticsCache with a size limit~~ Add a memory bound FileStatisticsCache for the Listing Table Jan 31, 2026

mkleen mentioned this pull request Jan 31, 2026

Add heap memory estimation for statistics #19599

Closed

kosiew requested changes Feb 4, 2026

View reviewed changes

Comment thread datafusion/execution/src/cache/cache_unit.rs Outdated

Comment thread datafusion/common/src/heap_size.rs

mkleen force-pushed the file-stats-cache branch from 59c6bce to 4542db8 Compare February 4, 2026 12:08

mkleen requested a review from kosiew February 4, 2026 12:10

kosiew approved these changes Feb 5, 2026

View reviewed changes

mkleen force-pushed the file-stats-cache branch from 205f96c to 92899a7 Compare February 10, 2026 05:58

martin-g reviewed Feb 10, 2026

View reviewed changes

mkleen force-pushed the file-stats-cache branch from 92899a7 to 2e3aff9 Compare February 11, 2026 14:49

mkleen added 2 commits May 14, 2026 13:47

Add reset after show in slt

4c8bd64

Extract cache invalidation logic

40b4550

mkleen force-pushed the file-stats-cache branch from f340fe8 to 40b4550 Compare May 14, 2026 11:47

Merge branch 'main' into file-stats-cache

f648bee

kosiew approved these changes May 15, 2026

View reviewed changes

alamb reviewed May 15, 2026

View reviewed changes

alamb added this pull request to the merge queue May 18, 2026

Merged via the queue into apache:main with commit d379241 May 18, 2026
36 checks passed

This was referenced May 19, 2026

test: add more tests and docs for heap size estimation #22358

Merged

Unify LRU memory-limiting caches into one generic cache #22359

Open

alamb mentioned this pull request May 20, 2026

Release DataFusion 54.0.0 (Apr 2026 / May 2026) #21080

Closed

55 tasks

Conversation

mkleen commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mkleen commented Feb 4, 2026

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

mkleen commented Feb 10, 2026

Uh oh!

martin-g Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

mkleen Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkleen commented Feb 10, 2026

Uh oh!

mkleen commented May 15, 2026

Uh oh!

kosiew commented May 15, 2026

Uh oh!

adriangbot commented May 15, 2026

Uh oh!

mkleen commented May 15, 2026

Uh oh!

adriangbot commented May 15, 2026

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

kosiew commented May 15, 2026

Uh oh!

mkleen commented May 15, 2026

Uh oh!

alamb commented May 15, 2026

Uh oh!

alamb commented May 15, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

adriangbot commented May 15, 2026

Uh oh!

adriangbot commented May 15, 2026

Uh oh!

mkleen commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented May 18, 2026

Uh oh!

alamb commented May 18, 2026

Uh oh!

nuno-faria commented May 18, 2026

Uh oh!

alamb commented May 18, 2026

Uh oh!

Uh oh!

alamb commented May 20, 2026

Uh oh!

mkleen commented Jan 28, 2026 •

edited

Loading

mkleen commented May 16, 2026 •

edited

Loading